Leveraging Asynchronicity in Gradient Descent for Scalable Deep Learning
نویسندگان
چکیده
In this paper, we present multiple approaches for improving the performance of gradient descent when utilizing mutiple compute resources. The proposed approaches span a solution space ranging from equivalence to running on a single compute device to delaying gradient updates a fixed number of times. We present a new approach, asynchronous layer-wise gradient descent that maximizes overlap of layer-wise backpropagation (computation) with gradient synchronization (communication). This approach provides maximal theoretical equivalence to the de facto gradient descent algorithm, requires limited asynchronicity across multiple iterations of gradient descent, theoretically improves overall speedup, while minimizing the additional space requirements for asynchronicity. We implement all of our proposed approaches using Caffe – a high performance Deep Learning library – and evaluate it on both an Intel Sandy Bridge cluster connected with InfiniBand as well as an NVIDIA DGX-1 connected with NVLink. The evaluations are performed on a set of well known workloads including AlexNet and GoogleNet on the ImageNet dataset. Our evaluation of these neural network topologies indicates asynchronous gradient descent has a speedup of up to 1.7x compared to synchronous.
منابع مشابه
Scalable Planning with Tensorflow for Hybrid Nonlinear Domains
Given recent deep learning results that demonstrate the ability to effectively optimize high-dimensional non-convex functions with gradient descent optimization on GPUs, we ask in this paper whether symbolic gradient optimization tools such as Tensorflow can be effective for planning in hybrid (mixed discrete and continuous) nonlinear domainswith high dimensional state and action spaces? To thi...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملSparkNet: Training Deep Networks in Spark
Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensi...
متن کاملA Geometric Approach to Leveraging
AdaBoost is a popular and eeective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leverag...
متن کامل